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Abstract 


We present a new methodology and accompanying theory to test for separability 
of spatio-temporal functional data. In spatio-temporal statistics, separability is a 
common simplifying assumption concerning the covariance structure which, if true, 
can greatly increase estimation accuracy and inferential power. While our focus is on 
testing for the separation of space and time in spatio-temporal data, our methods can 
be applied to any area where separability is useful, including biomedical imaging. We 
present three tests, one being a functional extension of the Monte Carlo likelihood 


method of Mitchell et al. (2005), while the other two are based on quadratic forms. 
Our tests are based on asymptotic distributions of maximum likelihood estimators, and 
do not require Monte Carlo or bootstrap replications. The specification of the joint 
asymptotic distribution of these estimators is the main theoretical contribution of this 
paper. It can be used to derive many other tests. The main methodological finding 
is that one of the quadratic form methods, which we call a norm approach, emerges 
as a clear winner in terms of finite sample performance in nearly every setting we 
considered. The norm approach focuses directly on the Frobenius distance between the 
spatio-temporal covariance function and its separable approximation. We demonstrate 
the efficacy of our methods via simulations and an application to Irish wind data. 


1 Introduction 


The assumption of separability is used heavily in spatio-temporal statistics, Haas (1995) 


Genton (2007), Hoff (2011), Paul and Peng (2011) Sun et al. (2012) among many others. It 


is introduced in many textbooks, e.g. Schabenberger and Gotway (2005), Sherman (2011) 


Separability means that the spatio-temporal covariance structure factors into the product 
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of two functions, one depending only on space, the other only on time. Such an assumption 
provides a number of benehts. From a modeling perspective, it allows one to draw on the 
large literature on covariance structures for spatial or temporal data. The simpler structure 
induced by separability is then much easier to estimate than a nonseparable structure. In 
the context of multivariate spatio-temporal data, the separability assumption can be stated 
in terms of the factorization of the covariance matrix. For more complex spatio-temporal 
data structures, analogous dehnitions can be formulated, as we explain below. The work 
presented in this paper is motivated by geo statistical functional data] functions are observed 
at a number of spatial locations, though our methods can be readily generalized to a number 
of areas. For example, in biomedical imaging, such as fMRI, one often has data in both space 
(the brain) and time, separability can greatly simplify modeling. In our context, separability 
implies that the optimal functions used for (temporal) dimension reduction are the same at 
every spatial location; information can then be pooled across spatial locations to get very 
good estimates of these functions. Geostatistical functional data are quite common. Perhaps 
the best known example is provided by annual temperature and log-precipitation curves (av¬ 
eraged over several decades) at several dozen locations in Canada. These data have been used 


in many examples in the monograph of of Ramsay and Silverman (2005) and many research 


papers that followed, Delicado et al. (2010) provide further references. Our own work has 


been concerned with such data as well; Gromenko et al. (2012) and Gromenko and Kokoszka 


(2012, 2013) study curves describing the evolution of certain ionospheric parameters mea¬ 


sured at globally distributed locations at which radar-type instruments called ionosondes 
operate. In Gromenko et al. (2015)| we study precipitation measurements extending over 
several decades at about sixty locations in the Midwest. 

Tests of separability for spatio-temporal covariances of sca/arhelds are reviewed in Mitchell 
et al. (2005, 2006| and Fuentes (2006) If the spatio-temporal covariance has a specihc para¬ 
metric form, a likelihood ratio test is possible. A similar parametric approach, in conjunction 


with bootstrap, is taken by Liu et al. (2014) in the context of functional data. Mitchell et al 


(2006) introduce a more general, nonparametric LRT test, which requires that the number of 


repeated measurements be greater than the product of the number of spatial and temporal 
locations. We explain their idea in greater detail in the following. Mitchell et al. (2005) 


explain how to deal with this restrictions by dividing the temporal domain into blocks. The 


test of Fuentes (2006) is based on the spectral representation which assumes that the data 
are available on a spatial grid. The data that motivate this research are not of this form. 

We now explain the contribution of this paper in more specihc terms. It is instructive to 
begin by summarizing the procedure of Mitchell et al. (2006) Suppose we observe N iid 
scalar helds at temporal points U and spatial locations Sk, so that the data are replications 
of the spatio-temporal observations: 

Xn{sk] ti), I <k < K, I <i < I. 

The iid assumption implies that the mean function EXn{s]t) = /i(s;f) is the same for each 
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replication. The covariances 




do not depend on n either. The assumption of separability implies that = UkeVij, where 
Uki does not depend on t (time) and Vij does not depend on s (space). This relation stated 
in the matrix form as 


( 1 . 1 ) 


s = V®u = 


^iiU 

Vi2^ . . 

.. nuU 

UsiU 

V22^ ■ - 

.. Ua/U 

viiV 

vnV .. 

.. vnV 


where U is the K x K matrix with entries um and V is the I x I matrix with entries Vij. 
The matrix H is KI x KI and can be viewed as the covariance matrix of the vectorized 


matrix {X^isk^U)} (with k indexing rows and i columns). Mitchell et al. (2006) use the test 
statistic 


( 1 . 2 ) 


T = N\ /flog del [V| + /logdet|U] - logdet|S| 


where V, U, S are Gaussian likelihood estimates defined in Theorem 2.1 Their approach is 


based on Theorem IM which justifies a Monte Carlo approximation for the null distribution 
of T. One can, e.g., use /i = 0, U = V = 1/ to obtain a large number of replicates of T, 
and so approximate its null distribution. 


Theorem 1.1 (Mitchell et al. (2006)). If the observations are normal, (1.1) holds, and 


N > KI, then the distribution ofT defined in (1.2) does not depend on /i,U,V. 


The choice of statistic (1.2) is thus fundamentally justified by the invariance property 


stated in Theorem 1.1 Perhaps more natural test statistics should be based on some distance 
between the matrices S and V®U. It might be hoped that a more direct comparison would 
lead to tests with better power. However, such statistics are not invariant in the sense of 


Theorem IM and their asymptotic distribution has not been found. The first contribution 
of this paper is to derive the joint asymptotic null distribution of S,V,U and show how 
it enables to derive the limit distribution a several natural test statistic in the multivariate 
context. This is addressed in Section The proofs are presented in the Appendix. 

The second contribution, which motivated our research, is related to functional data which 
are N replications of the held 


Xn{sk,t), l<k<K, teT. 

At each spatial location s^., a function with argument t is observed. For example, Xn{sk,t) 
can be the maximum daily temperature on day t of year n at location s^. For historical 
climate and environmental data sets of this type, N is about 100, K can be anything from 
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several dozen to a few hundred, and the number of measurements per year is J = 365. The 
approach of Mitchell et al. (2006) thus cannot be applied because the condition N > KI is 
violated. One cannot subdivide the years into smaller units because the assumption of the 
identical distribution would be violated, so the modihcation of Mitchell et al. (2005)| cannot 
be applied. Our approach exploits the functional structure of the data and uses a dimension 
reduction. Since for historical climate and environmental data sets N is fairly large, we 
derive asymptotic tests as —)■ oo using the results of Section These developments are 
described in Section [H 

The only related work we are aware of which also concerns functional data is given in 
the (currently) unpublished work of Aston et al. (2015) Since both papers, developed 


independently and concurrently, are currently unpublished, it is only appropriate to discuss 
differences and similarities, refraining from any evaluative statements. Both papers aim at 
solving the same problem, with motivation coming from different data structures. Our work 
is motivated by geostatistical functional data, curves observed at irregularly distributed 
spatial locations; Aston et al. (2015) are motivated by data observed on grids with one 
dimension which can be called space and the other time. In their application to phonetic 
data, space is frequency. Our approach is based on the joint asymptotic distribution of the 
MLE’s, followed, as an option, by dimension reduction; Aston et al. (2015) hrst perform 
dimension reduction in space and time which allows them to compute their tests statistic 
without estimating the full spatio-temporal covariance. Our method uses an approximation 
via limiting distributions; they use bootstrap approximations. The computational efficiency 
is thus obtained in very different ways. Regarding asymptotic theory, ours is based on joint 
MLE’s which require an iterative procedure to compute; Aston et al. (2015) compute the 
marginal space and time covariances by integrating out the other dimension, and then apply 
the CLT to the difference projected on a hnite number of tensor products. 

The remainder of this paper is organized as follows. Section compares the hnite sample 
performance of the tests derived Section We show that a test related to a norm of the 
difference S — V 0 U has correct size and is most powerful. It is also more powerful than a 
Monte Carlo test based on Theorem O (with a suitable extension to functional data). We 
apply the tests to an extensively studied Irish wind data set, and conhrm the conjecture of 
T. Gneiting that these space-time data are not separable. 


2 Multivariate theory 


This section clarihes the behavior of several test statistics based on normal maximum likeli¬ 
hood estimators. The theory is valid under the following assumption. 


Assumption 2.1. Assume Xi,... ,Xjv are iid normally distributed K x I matrices with 
E[X„] = M and Cov(vec(X„)) = S, where S is an KI x KI matrix of full rank. 


We begin with Theorem |2.1 whose proof is based on direct, but lengthy and tedious 
calculations of Gaussian likelihoods for vectorized matrices of various dimensions and solving 
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score equations. It is placed in the supplement. Recall that if A is a R' x / matrix, then 
vec(A) is a column vector of length KI obtained by stacking the columns of A on top of 
each other. 


Theorem 2.1. Under Assumption 2.1, the maximum likelihood estimators o/M and S are 


_ 1 ^ ^ 1 ^ ^ ^ 

M = - S = ^ vec(X„ - M)(vec(X„ - M))”^. 

n=l n=l 

If S admits the decomposition 

(2.1) S = V ® U, dim(U) = K x K, dim(V) = 1x1, 

where U and V are of full rank, then the maximum likelihood estimators o/U and V satisfy 

. N 




n=l 

N 


- M), 


n=l 


The estimators U and V are dehned indirectly and must be computed using an itera¬ 
tive procedure with some normalization to ensure identihability. The following algorithm, 


Dutilleul (1999), produces Op{N ^/^) consistent estimators. It uses the normalization 


tr(U,) = K. 

Algorithm 2.1. Initialize with Uq = Ik (K x K identity matrix). For i = 0,1,2,... 
calculate 


N 


Yi = 


NK 


5^(X„-M)TUri(X„-M), 


n=l 

, _ _ 

U.+i = IT, - M)vy(X„ - M)T. 

n=l 


tr(Ui+i) 


-U, 


i+l 1 


until convergence is reached. 


The most natural statistic to test separability, i.e. S = V (8) U, should be based on a 
difference between V 0 U and S. We will show that the statistic 

f^ = 7V||V0U-S|||, 

where || ■ ||p is the squared Frobenius matrix norm (i.e. the sum of squares of all entries), 
converges, and hnd the asymptotic distribution. This distribution involves the asymptotic 
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covariance matrix of vec(V ® U — S), which we denote by W. The form of W is complex, 


see (B.l). To obtain a chi-sqnare limit distribntion, a snitable qnadratic form must be used. 


This leads to the statistic 

fw = Nvec(V 0 U - S)^W+vec(V 0 U - S), 

where W is an estimator of W and W+ is its generalized inverse. A generalized inverse must 
be used because U, V, and S (and the corresponding estimates) are all symmetric, and this 
implies many linear constraints, Uke = U^k for example, on the entries of vec(V 0 U — S) 
and so of W. Using a generalized inverse is equivalent to dropping redundant entries. 

We will also show that the likelihood ratio statistic 


Tl = N (^Tlogdet(U) + A:iogdet(V) - logdet(S) 

discussed in Section has the same limit as Tw^, i.e. is asymptotically chi-square with a 
known number of degrees of freedom. The asymptotic chi-square distribution of Tl was 
claimed by Lu and Zimmerman (200^ without proof. We present a detailed proof in Sec¬ 
tion]^ [Mitchell et al. (2006) did not use this asymptotic result; they utilized a Monte Carlo 
hnite sample approximation based on Theorem 1.1[ We collect our results in Theorem 2.2 


Theorem 2.2. Suppose Assumption 2.1 and decomposition (2.1) hold. Let"W be the KI x 


KI matrix defined in (B.l), with 71 , 72 , • • • , 7 k its eigenvalues. Then, as N ^ 00 , 


xl 


and Tt 


w 


xl 


where 


and 


^_KI{KI + l) K{K + l) /(J + 1) , ^ 
d — T 1 , 


R 


fp ~ 5^7rX?W, 

r=l 

where {Xi(n)} are iid chi-square random variables with one degree of freedom. 

Theorem |2.2| is proven in Section The three statistics listed in Theorem 2^ are not 


the only ones that our theory covers. Theorem B.l which specifies the joint asymptotic 
distribution of V, U and S, can be used to derive the asymptotic distribution of many other 
reasonable test statistics. 


3 Tests for functional data 

We now show how the results of Section [^ are applied to testing separability of geostatistical 
functional data. For a reader interested in learning more about functional data methods there 


are now several introductory books including Ramsay and Silverman (2005); Ramsay et al. 
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(2009) Horvath and Kokoszka (2012); Hsing and Euba nk (2015) We consider independent 


spatio-temporal random fields X„(-,-),l < n < N, which have the same distribntion as 
the field {X{s,t),s G G T}, which satisfies E J^X‘^{s,t)dsdt < oo. Then X„(s,t) = 
/i(s,t) + £„(s,t), where /i G L‘^{S x T), and the Sn are iid random elements of L^(5 x T) 
which satisfy 

e‘^{s,t)dsdt < oo and Een{s,t) = 0. 



5 JT 


We consider the covariances 


cr(s,s';t,T) = Cov(X(s,t),X(s',t')) = E[e:„(s; t)e:„(s', E)]. 

Onr objective is to test 

(3.1) Eq ■ o-(s,s';t,t')=U(s,s')y(t,t'). 

As in the mnltivariate case, the fnnctions W and V are nniqnely determined only np to 
mnltiplicative constants, and a testing algorithm mnst inclnde some arbitrary normalization. 
However, the P-valnes of onr testing procednres do not depend on this choice. We now 
proceed with the description of test procednres of increasing complexity. 


3.1 Procedure 1: fixed spatial locations, fixed temporal basis 

We assnme that for each n the field is observed at the same spatial locations s^, 1 < fc < 
K. We estimate /r(sfc,f) by the sample average = N~^ J2n=i , and focns in 

the following on the covariance strnctnre. Under Hq, the covariances of the observations are 

Coy{X4sk,t)X4s,,f)) = U{k,i)V{t,t'), 


with entries U{k,i) = U{sk,Si) forming a. K x K matrix U, and V being the temporal 
covariance fnnction over TxT. As in the mnltivariate setting, the estimation of the matrix 
U and the covariance fnnction V mnst involve an iterative procednre. In the fnnctional 
setting, a dimension rednction is also needed. Snppose {vj,j > 1} is a basis system in L‘^{T) 
snch that for snfficiently large J, the fnnctions 

.1 

= /i(Sfc,f) + ^On(Sfc)nj(t) 

J=1 


are good approximations to the fnnctions X„(sfc). We thns replace a large nnmber of time 
points by a moderate nnmber J, and seek to rednce the testing of Hq (3.1) to testing the 
separability of the covariances of the transformed observations given as K x J matrices 


(3.2) = [^jn{sk), l<k<K, l<j<J]. 

The index j shonld be viewed as a transformed time index. The nnmber I of the actnal 
time points ti can be very large, J is nsnally mnch smaller. The following proposition is 
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easy to prove. It establishes the connection between the testing problem (3.1) and testing 
the separability of the transformed data (3.2). The assumption that the Vj are orthonormal 
cannot be removed. 

Proposition 3.1. For some orthonormal Vj, set 


If 

(3.3) 

then 

(3.4) 


j=i 


E[^^{s,)Use)] = U{k,i)V{j,t), 


CoY{X^^\sk,t),X^'^\s,,s)) = U{k,i)Vit,s). 


Conversely, (3.4) implies ( |3.3[ ). The entries V{j,i) andV{t,s) are related via 

V{j,i) = ff V{t,s)vj{t)vi{s)dtds, V{t,s) = ^V{j,i)vj{t)vi{s). 

We assume that {vj, 1 < j < J} is a hxed orthonormal system, for example the hrst J 
trigonometric basis functions. Slightly abusing notation, consider the matrices 

^{KJxKJ), ij{KxK), V(JxJ), 


dehned as in Theorem 2.1, but with matrices X„ replaced by the matrices S„. The index 
j < J now plays the role of the index z < / of Section]^ To apply tests based on The orem |2.2[ 
we must recursively calculate U and V using the relations stated in Theorem 


2.1 


This can 

be done using Algorithm 2T with in place of X„. This approach leads to the following 
test procedure. The test statistic can be one of the three statistics introduced in Section 

Procedure 3.1. 

1. Choose a deterministic orthonormal basis Vj,j > 1. 

2. Approximate each curve Xn{sk,t) by 


i=i 


Construct K x J matrices S„ dehned in (3.2). 

3. Compute the matrix 

^ 1 ^ ^ 1 ^ 
S = — ^vec(H„-M)(vec(H„-M))^, M = — 


n=l 


n=l 











Using Algorithm 2.1 with in place of X„, compute the matrices U and V. 


4. Estimate the matrix W defined by (B.l) by replacing S,U, V by their estimates. 

5. Calculate the P-value using the limit distribution specified in Theorem 2^, with I 
replaced by J. 


Step 2 can be easily implemented using R function pca.fd, see Ramsay et al. (2009) 


Several methods of choosing J are available; we used the cumulative variance rule requiring 
that J be so large that at least 80% of variance is explained for each location s^.. 


3.2 Procedure 2: fixed spatial locations, data driven temporal 
basis 


In Section 3.1, we used a deterministic orthonormal system. To achieve the most efficient 


dimension reduction, it is usual to project on a data driven system, with the functional 
principal components being used most often. Since the sequences of functions are dehned 
at a number of spatial locations, it is not a priori clear how a suitable orthonormal system 
should be constructed, as each sequence {X„(sfc), 1 < n < N} has different functional prin¬ 
cipal components Vj{sk),j > 1, and Proposition 3.1 requires that a single system be used. 
Our next algorithm proposes an approach which leads to suitable estimates U, V and S. It 
is not difficult to show that these estimators are consistent. 


Algorithm 3.1. Initialize with Uq = ^k- 

For i = 1, 2,..., perform the following two steps until convergence is reached. 
1. Calculate 

N 

V.(f, t') = {NK)-^ t) - /!(•, t))^Ur\(X„(-, t') - fi{;t')). 

n=l 


Denote the eigenfunctions and eigenvalues of V* by {up} and {Ajj}. Determine Jj such that 
the hrst Jj eigenfunctions of Vj explain at least 85% of the variance. 

2. Project each function X„(sfc, •) on the first J* eigenfunctions of V*. Denote the scores of 
these projections by 


and calculate 


Ji N 


EE 

j=l n=l 


^in j) j) 


A,; 




Normalize Uj so that tr(Uj) = K. 

Let {%, 1 < J < </} denote the final eigenfunctions. Carry out the final projection 
= (X„(sfc, •) — /i(sfc, ■),Vj). For each n, denote by the K x J matrix with these 
entries. Set 


(3.5) 


S 


1 

N 


N 

y^vec(Z„) vec(Z„)^ 

n=l 
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and apply Algorithm 


2.1 


with X„ = Z„ to obtain U and V. 


Using the above algorithm, the testing procedure is as follows: 


Procedure 3.2. 

1. Calculate matrices S,U, V according to Algorithm 


3.1 


2. Perform steps 4 and 5 of Procedure 3.1 


3.3 Procedure 3: dimension reduction in both space and time 

Similar with Procedure 1 we assume that for each n the field is observed at the same 
spatial locations s^, 1 < fc < X. We estimate /r(sfc,f) by the sample average = 

N~^ J2n=i^n{sk,t), and focus in the following on the covariance structure. Under Hq, the 
covariances of the observations are 


Cov(X„(sfc,f)X,,(s,,C)) = 


with entries U{k,i) = U{sk,Si) forming a. K x K matrix U, and V being the temporal 
covariance function over T x T. 

The testing procedure for dimension reduction in both space and time is as follows: 


Procedure 3.3. 

1. Choose a deterministic orthonormal basis Vj,j > 1. 

2. Approximate each curve Xn{sk,t) by 




i=i 


Construct K x J matrices dehned in (3.2) where J is chosen so large that for each k the 


first J sample eigenvalues explain at least 80% of the variance. This is Functional Principal 
Components Analysis carried out on the pooled (across space) sample. 

3. Approximate each vector (^j„(si),... using 


L 

^ Clj',nUl(^Sk}. 
1=1 


The vectors {ui{si),... ,ui{sk)) are the eigenvectors of the following matrix 


U,{k,i) = {NJ,) 


J^ N 

-EE 

j=l n=l 




A,; 




Construct the L x J matrices = [Qj-n, 1 < ^ < T, 1 < j < J] where L is chosen 
large enough so that the hrst L eigenvalues explain at least 80% of the variance. This is a 
multivariate PCA on the pooled (across time) variance adjusted sample. 


10 







4. Compute the matrix 


^=nY1 - M)(vec(Z„ - M))T, M = - Z, 


n=l 


n=l 


Using Algorithm 2A with Z„ in place of Xn, J in place of / and L in place of K compute 
the matrices U and V. 


5. Estimate the matrix W defined by (B.l) by replacing S,U, V by their estimates. 

6. Calculate the P-value using the limit distribution specified in Theorem 2.2, with I 
replaced by J and K replaced by L. 

Step 2 can be easily implemented using R function pca.fd and step 3 by using R function 
prcomp. 


4 Finite sample comparison and application to Irish 
wind data 


We now compare the performance of the tests based on statistics introduced in Section]^ and 
procedures introduced in Section We include in the comparison the modihed approach of 
Mitchell et al. (2006) which is based on Theorem O and the spatial principal components 


introduced in Section |3.3[ We tabulate the results for the most general approach described 
in Section 13.31 which also leads to most accurate tests for the simulated data we used. 


The relative ranking of the tests remains the same if the approaches of Sections |3.1| and 
3.2| are applied; these approaches perform best if the number of spatial locations is small. 


The approach based on Theorem 1.1 can typically be applied only in conjunction with the 


procedure of Section [373] so that the condition N > LJ holds. 

We Thus consider four test procedures applicable to space-time functional data, which 
we denote Tl,Tp,T\y and Tl_mc- The first three procedures use asymptotic critical values 


of limit distributions specified in Theorem 2.2 Tp_Mc uses the Monte Carlo critical values 
computed using /x = 0, U = 1^, V = Ij. 

To generate data, we use the following spatio-temporal covariance function introduced by 
Gneiting (2002)[ 


(4.1) 


cr(s, s', f, t') = 


a 


(a|f-f'|2“ + 1)^ 


exp 


c s 


- «'I|27 


{a\t - + 1)/37 


In this covariance function, a and c are nonnegative scaling parameters of time and space 
respectively, a and 7 are smoothness parameters which take values in ( 0 , 1 ], (3 is the sepa¬ 
rability parameter which takes values in [ 0 , 1 ], > 0 is the point-wise variance and hnally 

r > 13d/2, where d is the spatial dimension. We focus on the effect of the space-time inter¬ 
action parameter, /? G [ 0 , 1 ]. \i (3 = 0 , the covariance function is separable. As (3 increases 
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the space-time interaction becomes stronger. We set 7 = 1, a = 1/2, = 1, a = 1, c = 1 

and r = 1 so the covariance function becomes: 


cr(s, s', t, t') 


1 ( ||s-s'|p ^ 


We use / = 100 time points equally spaced on [0,1] and = 11 space points on a grid 
The number K = 11 is motivated by the Irish wind data considered by 
which we also study below in this section. We will consider different values 
of the parameter /3 as well as the number of spatial PC’s, L, and temporal FPC’s, J. We 
will also consider different values for the sample size N. All empirical rejection rates are 
based on one thousand replications, so their precision is about 0.7 percent for size (we use 
signihcance level of 5%), and about two percent for power. 

We study three different scenarios. The hrst scenario considers different values of (3. The 
second scenario examines the effect of the sample size N, while the third scenario the effect 
of the number of principal components. Each table reports the rejection rates in percent. 

• Scenario 1: = 100, L = J = 2, /? = 0, 0.5,1. 



Tl-mc 

Tl 

Tp 

Tw 

/3 = 0 

5.1 

5.9 

4.5 

3.7 

13 = 0.5 

12.3 

13.4 

15.2 

5.4 

P = 1 

54.1 

55.8 

63.4 

32.3 


in [0,1] X [0,1]. 
Gneiting (2002) 


The test Tp has the best balance of size and power. The test T\y is a bit conservative here, 
however we will see in Scenario 3 that this pattern is not consistent. The two likelihood 
methods do not exhibit signihcantly different rejection rates. 

• Scenario 2: N = 100,150, 200, L = J = 2, /3 = 1. 



Tl-mc 

Tl 

Tp 

Tw 

N = 100 

54.1 

55.8 

63.4 

32.3 

N = 150 

68.0 

68.3 

79.8 

29.8 

N = 200 

80.4 

80.8 

91.5 

52.0 


As the sample size increases, the empirical power is also increasing, with the Tp-test pre¬ 
serving its lead in terms of power. 

• Scenario 3: N = 100, (3 = 0,1, L + J increasing. 
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/? = 0 

Tl-mc 

Tp 

Tp 

Tw 

L = J = 2 

5.1 

5.9 

4.5 

3.7 

L = 2, J = 3 

5.5 

6.5 

5.3 

29.5 

L = 3, J = 2 

4.6 

5.4 

4.5 

10.2 

L = J = 3 

4.4 

6.9 

4.7 

39.4 

L = J = 4 

5.2 

13.6 

5.2 

98.1 

/3 = 1 





L = J = 2 

54.1 

55.8 

63.4 

32.3 

L = 2,J = 3 

52.3 

55.0 

75.5 

91.6 

L = 3,J = 2 

47.3 

50.0 

74.6 

59.9 

GO 

54.7 

61.9 

89.1 

95.8 

L = J = A 

79.0 

90.5 

99.7 

100.0 


Only the tests T^-mc and Tp are robust to the number of the principal components used. 
This a a very desirable property, as in all procures of FDA there is some uncertainty as the 
actual number of PC’s that should be used. The test Tp is more powerful than Tp_Mc- 
Our overall conclusion is that the norm based test, Tp, works better than the other ap¬ 
proaches. This is due to the fact that it targets the difference between S and U ® V most 
directly. The application of this norm based approach is possible because we have derived 
the asymptotic distribution of Tp. Even though this distribution is very complex (the matrix 
W defined in (B.l) has a complex structure), once the algorithm is coded, the test can be 
applied without difficulty. 


We conclude this section by considering the Irish wind data of Haslett and Raftery (1989) 


which consists of daily averages of wind speeds at 11 synoptic meteorological stations in 
Ireland during the period 1961 — 1978. The data are available at Statlib, 
http://hb.stat.cmu.edu/datasets/wind.data, The geographical locations of the stations are 
shown in Haslett and Raftery (1989) they are fairly uniformly distributed over Ireland. 
Each functional observation X„(sfc, t) consists of the average of wind speed for day t, month 
n {N = 216), and at location Sk- The left panel of Figure shows the daily averages for the 
11 stations for January 1961. The right panel shows a functional box plot of the same data 
(Sun and Genton, 2011). 


Gneiting (2002) | estimated model (4.1) on these data and obtained {3 = 0.61, which indi¬ 


cates a nonseparable covariance structure. We applied our tests to validate this conjecture. 
All four tests produced P-values smaller than lOE-4 for all T, J G {2,3,4}. The P-values 
of the Tp-test were all smaller than lOE-104 and of the T^p-test smaller than lOE-21. The 
Tp-test had the largest P-values (but still extremely small). 

While the tests fully validate the conclusion of Gneiting (2002), we provide another illus¬ 
tration by applying them to residual curves obtained after removing the monthly mean from 
each curve; we center all January months, February months, etc., separately. This simple 
transformation removes to a large extent the annual seasonality, and it is interesting to see 
if a nonseparable structure is still needed for the data so transformed. The P-values for 
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Figure 1: Irish wind speed curves for January 1961. Each is measured at a different location. 
The left panel plots the functional observations while the right gives a functional boxplot. 


selected combinations of L and J are shown in Table We now see a much more interesting 
pattern. Only when L and J are larger than 3, do we see a clear evidence for nonseparability. 
A possible explanation is that the covariance structure is made up of two components, one 
which is separable and one which is not. The separable component makes up the majority 
of the variation in the process which is why the separability is not seen for smaller values of 
L and J. The pattern of dependence of the P-values on L and J is inconsistent with what 
we have seen in Scenario 3 above, but this may be due to the specihc parameter values in 
(4.1) we used in the simulations. 



Tl-mc 

Tl 

Tf 

Tw 

to 

0.06 

0.055 

0.039 

0.034 

L = J = 3 

0.467 

0.436 

0.149 

0.203 

L=J = 4: 

<10E-6 

8.079E-24 

4.237E-09 

2.146E-09 


Table 1: 


P-values for the separability tests applied to deseasonalized wind speed data. 
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A Derivation of the Q matrices 


This section introduces four matrices that describe the covariance structure of products 
of various vectorized matrices consisting of standard normal variables. We refer to them 
collectively as “Q matrices”, as we use the symbol Q with suitable subscripts and superscripts 


ic distribution of the vectorized 
In particular, the asymptotic 


2.2 


to denote them. These matrices appear in the asympto 
matrices U, V, S, which, in turn, is used to prove Theorem^ __ 

distribution of statistic Tp, which we recommended in Section [d, is expressed in terms of 
these Q matrices. Some of them are defined though an algorithm. 

Theorem A.l. //E is an K X I matrix of standard normals, then 


(A.l) 

where 


Cov(vec(EE^)) = 21 Qk- 


Qic(bj) = < 


i=j = k + {k — 1)K for k = 1,... ,K 
i=j^k + {k — 1)K for k = 1,..., K 


1 i ^ j = 1 + 


p-i)-(p-i) mod K) 


K 


+ ((i - 1) mod K)K 


0 otherwise, 

Proof. Denote by Cki the independent standard normals, and set 


Gfc — [cfei, efc2, ■ ■ ■, e-kiY 1 1 < ^ < A', 


so that 


E = 


A 

■'K. 


Then for any i,j in {1,..., A} we have that the {i,j) entry of Cov(vec(EE^)) can be written 


as 


Cov(eJei,,e4eiJ, 


where we have the relationships 

i = ki P {fi — 1)K, ki = ((i — 1) mod K) + 1, /i = 1 + 
j = k 2 + ih - l)K, k 2 = {{j - 1 ) mod K) + 1, h = I + 


{i — 1) — {i — 1) mod K 
K 

(j - 1) - (i - 1) mod K 
K 


For the diagonal terms, i.e. i = j, we have two settings ki = k 2 = h = h, in which case 
the covariance is 21, or alternatively ki = k 2 ^ h = I 2 in which case the covariance is I. 
Since the former occurs in every term, we have established the proper pattern for the 
diagonal. 
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We now need only establish the pattern for the off diagonal. Every term in the off diagonal 
can be expressed as Cov(e7ej, eje;), for some i, j, /c, / = 1,..., K. Clearly, if any one index 
is different from the other 3, then the covariance is 0. We can’t have all four indices being 
equal as that would be a diagonal element, and we can’t have i = k^j = k as that would 
also be a diagonal element. If i = j and k = I then two inner products are independent, 
and thus the covariance is zero. Therefore, the only nonzero off diagonal entries occur when 
i = I ^ j = k, and the covariance would be I. To determine where in the x matrix 
these occur, we use the change of base formulas. □ 


We illustrate the form of the matrices Qk- 

/l.O 0 0 o\ 

_ 0 0.5 0.5 0 

0 0.5 0.5 0 



/l.O 

0 

\ 

0 

0 

0 

0 

/ 

0 

0 

0 \ 


0 

0.5 

0 

0.5 

0 

0 

0 

0 

0 


0 

0 

0.5 

0 

0 

0 

0.5 

0 

0 


0 

0.5 

0 

0.5 

0 

0 

0 

0 

0 

Q3 — 

0 

0 

0 

0 

1.0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0.5 

0 

0.5 

0 


0 

0 

0.5 

0 

0 

0 

0.5 

0 

0 


0 

0 

0 

0 

0 

0.5 

0 

0.5 

0 



0 

0 

0 

0 

0 

0 

0 

1 -0/ 


Theorem A.2. //E is an K X I matrix of standard normals, then 


(A.2) 


Cov(vec(EE^),vec(E^E)) = 2VIkQk,i, 


where Qk,i is an x P matrix given by 


Qx,/(bj) 


(A/)-V2 t = ki + K{k, -l),j = k2 + I{k2 -1) k, = l,...,K k2 = I,..., I 
0 otherwise, 


Proof. Here, each entry of the above covariance matrix is obtained by taking two rows of E 
(possibly the same row) forming the inner product, taking two columns, taking their inner 
product, and then computing the covariance between the two. Due to the symmetry of this 
calculation, there are only three possible resulting values: when the two rows are different, 
then the two columns are different, or when both the rows and columns are different. When 
both rows and columns are different, we can (without loss of generality) take the hrst two 
rows and columns. In that case, the covariance becomes 

( K I \ K I 

E ei,fce2,fc, E ei,iei,2 I - EE Cov (ei,fce2,fc, e*,16^,2) • 

k=l i=l / k=l i=l 
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However, every summand above is zero when i > 2 or k > 2 since they will then involve 
independent variables. Therefore, we can express the above as 


Cov(ei,162,1 + 61,262,2, 61,161,2 + 62,162,2) = 0. 


Hence, any term with two different rows and columns is zero. A similar result will hold when 
there are either two different rows or two different columns. The only nonzero term will stem 
from taking the same row and same column, in which case the value becomes 

Cov(6i, 161,1 + 61,261,2,61,161,1 + 62,162,1) = Var(6i 1 ) = 2. 

Therefore, every nonzero entry will be 2. We now only need to determine which entries of 
the covariance matrix correpond to taking the same row and same column. Considering the 
structure induced by vectorizing, the hrst row of the covariance matrix and every subsequent 
K rows will correspond to matching the same row of E. Similalry, the first and every 
subsequent I column will correspond to matching the same colmun of E. This corresponds 
to our dehnition and the result follows. 

□ 


Some examples Q,k,i are 


and 


Q2,3 — 


Q2,2 — 


0 

0 0 

0 0 

^6-^/2 Q 


/ 0.5 

0 

0 

0.5\ 

0 

0 

0 

0 


0 

0 

0 

0 


^0.5 

0 

0 

0.5/ 

0 0 

6-1/2 

0 

0 

0 0 

0 


0 

0 

0 0 

0 


0 

0 

0 0 

6-1/2 

0 

0 


0 

0 

0 

0 


6-i/2\ 

0 

0 

6-1/2 y 


Before the next theorem, we dehne the matrix Q,r^k via a pseudo code. 


Begin Code 

Set CIr^k to be an by matrix of zeros 

For / = 1,..., 

/ = 1 + L(^ - 1)/{K^I)\ 
k = l)K^T)/{KI)\ 

p = 1 + L(i - 1 - (/ - l)K^I -{k- l)KI)/{K)\ 
m = 1)K^I -{k- 1)KI - (p - 1)K 

If (p = / and m ^ k) then Q_r,k [i,m + {k - l)K] = l/{2^/I) 
and [i, k + {m — l)K] = 1/(2a/7) 

If (p = / and m = k) then Q_r,x[/, m + {m — 1)K] = 1/\/l 
End For Loop 
End Code 
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Theorem A.3. IfE is an K x I matrix of standard normals and = vec(E), then 
(A.3) Cov(vec(EoEj), vec(EE^)) = 2VTQFt,K, 

where Qr^k is the x matrix defined by the pseudo code above. 

Proof. Begin by considering the {i,j) of the desired covariance matrix. There exists indices 
such that the {i,j) entry is equal to 

Cov(em,pCfc,Z) Cj, Cgfi 

where m, k, r, s take values 1,..., A and p, I take values 1,..., J. Moving from (r, s) to j we 
have that 

j = 1 + (r - 1) + (s - 1)A, 
and the reverse is obtained using 

5 = 1 + L(j - 1)/AJ 
r = j — {s — 1)A. 

Moving from {m,p, k, 1) to i we have that 

i = 1 + (m - 1) + (p - 1)A + (A; - 1)A/ + (/ - 1)K^I. 

We can move back to {m,p, k, 1) from i using 

/ = 1 + L(^ - 1)/{K^I)] 
k = l+[{i-l-{l- 1)A2/)/(A/)J 
p = 1 + [(i - 1 - (/ - l)K^I -{k- l)A/)/(A)J 
m = i-{l- 1)A^J -{k- 1)AJ -{p- 1)A 

We can see that the covariance will be zero if any one of m,k,r or s is distinct. Thus, 
the only nonzero entries will correspond to either m = r = k = s, m = r^k = s, or 
m = s ^ k = r (when m = k ^ r = s we get zero). When all four are equal we get that 

Cov{em,pem,h e^e^) = Cov{e^^pem,i, 

which will be zero unless p = I, in which case it equals 

Cov(e^^p, = (E[e^ p] - E[e^ /) = 2. 

We have therefore established the hrst ^/-statement in the pseudo code. 

Turning to the next case, when m = r ^ k = s, we have that 

“ 1 “ 

which is again only nonzero when p = I, in which case it will be 

Cov(em,pCfc,p) E[e^ p] E[e^ p] 1. 

An identical result will hold for when m = s ^ k = r, which gives both the second and third 
^/-statements in the pseudo code, and the proof is established. □ 
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One example of Qr^k is 


/ 2 - 1/2 0 0 0 \ 
0 8-1/2 g-l/2 Q 

0 0 0 0 

0 0 0 0 

0 8-1/2 g-l/2 Q 

0 0 0 2 - 1/2 

0 0 0 0 



0 0 0 0 

2-1/2 0 0 0 

0 8-1/2 g-l/2 Q 

0 0 0 0 

0 0 0 0 

0 8-1/2 g-l/2 Q 

\ 0 0 0 2-1/2/ 

For the last Q matrix, we also use pseudo code which is only slightly different from the 
code defining Qr^k- 
Begin Code 

Set Qrj to be an by P matrix of zeros 

For f = 1 ,..., i?2 

/ = 1 + L(^ - l)/iK^I)\ 
k = l+[{i-l-{l- 1)K^I)/{KI)\ 
p = 1 + [(/ - 1 - (/ - l)K^I -{k- l)KI)/{K)\ 
m = i-{l- l)K^I -{k- 1)KI -{p- 1)K 
li {m = k and p ^ 1) then Qrj/jP + (/ — pl] = 
and QR,i[i,l + (p - 1)1] = l/{2y/K) 
li {m = k and p = 1) then / + (/ — 1)/] = I/^/K 

End For Loop 
End Code 

Theorem A.4. IfEisanKxI matrix of standard normals and = vec(E), then 
(A.4) Cov(vec(EoEj),vec(ETE)) = 2VkQrj, 

where Qrj is the x P matrix defined by the pseudo code above. 

Proof. Begin by considering the {i,j) of the desired covariance matrix. There exists indices 
such that the (i, j) entry is equal to 

Cov(em,pefe,z,e/.)e(s)), 
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where m, k take values 1,... ,K and p, I, r, s take values 1,..., J. Moving from (r, s) to j we 
have that 

J = 1 + (r - 1) + (s - 1)/, 
and the reverse is obtained using 

S = 1 + L(j - i)//J 

r = j -{s- 1)1. 

Moving from {m,p, k, 1) to i we have that 

i = 1 + {m - 1) + {p - 1)K + {k - 1)KI + {I - 1)K^I. 

We can move back to {'m,p, fc, 1) from i using 

/ = !+[(*- 1)/{K^I)\ 
k = l+[{t-l-{l- 1)K^I)/{KI)\ 
p = 1 + L(z - 1 - (/ - l)K^I -{k- 1)KI)/{K)\ 
m = 1)1^K -{k- l)KI -{p- l)K. 

We can see that the covariance will be zero if any one of p,l,r or s is distinct. Thus, the 
only nonzero entries will correspond to either p = r = l = s,p = r^l = s, or p = s^l = r 
(when p = l^r = s we get zero). When all four are equal we get that 

Cov(em,pCfc,p) ^(p)^(p)) Cov(^eYn,pGk,py ^m,p "h 

which will be zero unless m = k, in which case it equals it will be equal to 2. We have 
therefore established the first ^/-statement in the pseudo code. 

Turning to the next case, when p = r ^ I = s, we have that 

Cov{em,pek,i, ejp^e^i)) = Cov{em Im 

which is again only nonzero when m = k, in which case it will be 

Cov(^efYi pCfYi i, efYi pCfYip E[e^ p] E[e^ J 1. 

An identical result will hold for when p = s ^ I = r, which gives both the second and third 
^/-statements in the pseudo code, and the proof is established. □ 


B Proof of Theorem 12.2 


We begin by establishing in Theorem B.l the joint null limit distribution of the vectors 
vec(U — U),vec(V — V), and vec(S — S). We hrst dehne several matrices that appear in 
this distribution. Recall the Q matrices derived in Section the matrix Qx is defined in 
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(A.l), Qk,i in (A.2), Qr^k in (A.3), and Qrj in (A.4). Denote by (■)+ a generalized inverse. 
We define the following generalized information matrices: 


_1 /U-^/2 Q \ 

f IQk ^/YKQKA ^ U-V2 0 \ 

^ \VIKQi,k KQj J V 0 V-V2 0 v-1/2J ’ 

=(D(DTXu,vD)+D^) + , 


where D is an (A'^ + P) x (A'^ + — 1) matrix whose colnmns are orthonormal and are 

perpendicular to vec(Ix 2 +/ 2 ), and 

Xs = ® ^ 5 .- 172 ^^ 

where R = KI. 


Theorem B.l. Suppose Assumption 2.1 and decomposition (2.1) hold. Assume further that 
tr(U) = K. Then 


An 


/vec(U-U)\ 
vec(V - V) 
^^vec(S — S)y 


4 Ar(o,r). 


The asymptotic covariance matrix F is defined as follows. The asymptotic covariance of 
(vec(U — U), vec(V — V)) is given by (X^ y)+, o/vec(S — S) is given by Z^, and the cross 
covariance matrix between the two is 



' U -^/2 (g) U - 1/2 

0 


0 

V-1/2 (g y-1/2 



Vkqi_ 




Proof. From standard theory for MLEs, Ferguson (1996) Chapter 18, we can use the partial 
derivatives of the log likelihood function (score equations) to find the Fisher information as 
well as asymptotic expressions for the MLEs. One can show that the cross terms of the 
Fisher information involving M and V, U and S are all zero, meaning that the estimate of 
the M is asymptotically independent of U, V and S. We therefore treat in the following M 
as known. 

We star t by working with U and V. Applying the constrained likelihood methods de¬ 
scribed in Moore et al. (2008), asymptotically, U and V are jointly normally distributed 


with means U and V and covariance given by the generalized inverse of the constrained 
Fisher information matrix. Starting with U we have that unconstrained score equation is 
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given by 


1 a/(M,U,V) 

7n au 

1 ^ 

= ^ 5^[U-1(X„ - M)V“1(X„ - M)TU-' - TU-i] 

'' n=l 
1 ^ 

= - TU"^] 

2y]V^ 

n=l 

1 ^ 

= VTf E[U""''|e»eI - -rn.„(]u-'/y 

^ n=l 

To get a handle on the nnconstrained Fisher information matrix (and therefore the covariance 
matrix), it will be easier to work with the vectorized version 


N 


2y/N 


(U-^/2 ® U-^/2) ^vec[E„E;[ - II 


KxK\- 


n=l 


Notice that we will have a complete handle on the above if we can nnderstand the form 
for the covariance of vec[E„E^ — However, this is a term that in no way depends 

on the nnderlying parameters as it is composed entirely of iid standard normals. We label 
Qk = (2/)“^ Cov(vec[E„E^ — Jl^xi^]) and its explicit form is given in 
The part of the Fisher information matrix for vec(U) is given by 

Identical arguments give that the part of the Fisher information matrix for vec(V) is given 
by 

The joint unconstrained Fisher information matrix for U and V is given by 


(A.l). 


^u,v = X 


^ JJ-l/2 Q 

0 V-l/2(^V-l/2 


/ IQk VTKQi,k\ /U-V2 0 U-V2 0 \ 

yVlKQi,K KQi V 0 V-V2 0V-1/2J’ 


where Qk,i is defined in (A.2). The constrained version is then given by 


X{jY = (D(D^Xu,vD)D+)+. 

Recall that D is an {K‘^ + P) x + P — 1) matrix whose columns are orthonormal and 
are perpendicular to vec(Ij^ 2 _,_/ 2 ). The form for D come from the gradient of the constraint 
tr(U) = K. 
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The last piece we need is the joint behavior of U (or V) and the estimator S. The score 
eqnation for S can be expressed as 


as 


N 


( S(i" - 


2 

N 


^n=l 
N 




2 2 

N 


Kn=l 


= Y,EnE: 


I 


KIxKI 




vn=l 


Using the same arguments as before, we get that Fisher information matrix for vec(S) is 


For the joint behavior, we use the following asymptotic expression for the MLEs, Ferguson 


(1996) Chapter 18, 

Vn 

and 


vec(U - U) 
vec(V - V) 


^/N 




u,v) 


vec ( MMdhX)' 

/ ai(M,u,v)' 

9V 


+ Op(l), 


vec 


^vec(E - E) = + o,,(i). 


For the covariance between S and U (or V) we obtain two more matrices, called Qr^k and 
Qrj which satisfy 


Cov [vec(E^„Ej„ — Ipxp), vec(E„E^ — — ‘2\/IQr^k, 

Cov [vec(Eo„Ej„ - Ipxp), vec(E^E„ - iFI/x/)] = 2y/KQRj. 

Recall that the diamond subscript indicates vectorization and the dehnitions of Qr^k and 
Qrj can be found in (A.3) and (A.4), respectively. The cross-covariance matrix for a//V vec(S — 
S) and \/N vec(U — U) is then given by 


//vec fiv-i/ 2 Wxu \\ . 


“ 2^^u,v) 


U-1/2 

0 


V-V2 0V-1/V [VkqIj 


as 


(S-^/2 0 S-^/2)X^ 


□ 
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Proof of Theorem 2.2: Since we have the joint asymptotic distribution for U, V, and S, we 
can use the delta method to hnd the asymptotic distributions of desired test statistics, and 
in particular, we can hnd the form of W, the asymptotic covariance matrix of vec(V (8) U) — 
vec(S). To apply the delta method, we need the partial derivatives. Taking the derivative 
with respect to Vij yields 


vecfl 




U) 


and with respect to Uk,i 


vec(V 0 lk,i). 

So the matrix of partials with respect to vec(V) is 


Gv = 


/vec(li,i 0U)^\ 
vec(l 2 ,i 0U)^ 

\vec(l/,/ 0 U)^/ 


with respect to vec(U) is 


Gu = 


/ vec(V0 

vec(V 0 l 2 ,i)^ 


Vvec(V 0 1k,kV ) 

and with respect to vec(S) is just (—1) times the K1 x KI identity matrix. We therefore 
have that 


vec(V 0 U) - vec(S) 

This implies that 
(B.l) W = 



f Go \ 

/vec(U)\ 

^ 1 


vec(V) 


\—1stxST/ 

Uec(S)/ 



The degrees of freedom are obtained by noticing that under the alternative S has KI{KI+ 
l)/2 free parameters, while under the null there K{K +l)/2 + J(/ + l)/2 — 1, where the last 
—1 is included because we have one constraint (tr(U) = K). 
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